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Abstract — The Autonomous System (AS) topology of the In- 
ternet (up to 61k ASs) is growing at a rate of about 10% per 
year. The Border Gateway Protocol (BGP) starts to show its 
limits in terms of the number of routing table entries it can 
dynamically process and control. Due to the increasing routing 
information processing and storage, the same trend is observed 
for routing model simulators such as DRMSim specialized in 
large-scale simulations of routing models. Therefore, DRMSim 
needs enhancements to support the current size of the Internet 
topology and its evolution (up to 100k ASs). To this end, this 
paper proposes a feasibility study of the extension of DRM- 
Sim so as to support the Distributed Parallel Discrete Event 
paradigm. We first detail the possible distribution models and 
their associated communication overhead. Then, we analyze this 
overhead by executing BGP on a partitioned topology according 
to different scenarios. Finally, we conclude on the feasibility of 
such a simulator by computing the expected additional time 
required by a distributed simulation of BGP compared to its 
sequential simulation. 
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I. Introduction 

The Internet evolution pushes its routing system to its limits 
in terms of i) memory cost due to the size of the routing 
tables (RT); ii) communication cost or complexity; iii) Border 
Gateway Protocol (BGP) complexity. Routing research has 
investigated new routing paradigms to address these issues. 
However, simulation of stateful distributed routing protocols 
operating asynchronously becomes a real issue at large-scale 
(order of 10/c nodes) |9j. No simulator provides the means 
to measure and to characterize the performance and behavior 
of such routing protocols on large networks and to compare 
them with BGP on the same simulation environment. For this 
purpose, we study the extension of DRMSim [3] in order to 
support the distribution of the routing model by partitioning 
the topology with respect to its properties and by extending 
the communication model. 

This paper is organized as follows. After describing in 
Section [IT] the state-of-the-art in routing model simulation 
and DRMSim, we detail two distributed models together with 
their associated communication overhead. In Section |IV| we 
describe our simulation scenarios and execution environment 
followed by the simulation results as well as the impact of 
topology partition on the communication and their analysis. 
Finally, we conclude on the feasibility of such a simulator in 
Section [Vj 



II. State of the art 

We can distinguish three classes of routing simulators: i) 
protocol simulators dedicated to the performance measurement 
and analysis of the routing protocol (procedures and format) 
at the microscopic level such as NS (5) and SSFNet (7); 

ii) configuration simulators dedicated to the simulation of 
BGP protocol specifics like SimBGP (§ and C-BGP (6); 

iii) routing model simulators like DRMSim J3) that do not 
execute the protocol low level procedures but their abstraction. 
Designed for the investigation of the performance of dynamic 
routing models on large-scale networks, these simulators allow 
execution of different routing models and enable comparison 
of their resulting performance. 

Routing model simulators require specification of an ab- 
stract procedural model, data model, and state model suf- 
ficiently simple to be effective on large-scale networks but 
still representative of the actual protocol execution. However, 
incorporating (and maintaining up to date) routing state infor- 
mation is technically challenging because of the amount of 
memory required to store the data associated to each state. 

DRMSim implements the Discrete-Event Simulation (DES) 
approach. It addresses the previous issue by means of ef- 
ficient graph-based data structures. It allows us to defined 
simulation scenarios where topology dynamics like link/router 
failures are considered. Several models can be easly developed 
and integrated with their own metric model for performance 
measurement. The provided set of models includes BGP, the 
Routing Information Protocol (RIP), and compact routing 
schemes such as NSR g) and AGMNT (TJ. 

III. Distribution models 

Implementing DRMSim as a distributed parallel discrete 
event simulator implies to distribute routers with their data 
structures but also the events execution among the different 
computational resources. The simulation feasibility depends 
mainly on the number of events to be transmitted between each 
logical processes (LP) and the available bandwidth between 
them. 

The main problems observed with the non-distributed ver- 
sion of DRMSim are: 

• Routing tables (RT): let n be the number of routers in the 
network and k the size of a routing entry. The memory 
needed for storing all the routing tables is 0(k.n 2 ). 

• Entries updates: event with the highest number of occur- 
rences. We always start from empty routing tables. 



Routers and BGP peering sessions are distributed among the 
different logical processes. We model the system by a graph 
G(V,E), where the set of vertices V is partitioned and every 
partition is managed by a logical process. The edges internal to 
a partition are known only by the corresponding LP. For edges 
with end-points in distinct partitions, there are two possible 
solutions. 

• Solution A: update events including the corresponding 
entries are transmitted between LPs. 

• Solution B: boundary edge end-points are duplicated, 
implying that the original vertices (as an example v and 
w in Fig[T]must synchronize their copies (V and w'). 

The advantage of Solution B compared to Solution A, is 
that only modified RT entries are sent, thus reducing the 
communication cost. However, duplication of vertices (routers) 
may be very harmful if the cuts resulting from the partitioning 
algorithm have many edges, reducing in turn the memory gain. 

A. Partitioning algorithms and complexity 

We assume that BGP routing updates constitute the main 
problem to solve, other type of events lead to negligible 
effects. For convenience, we use the following notations: 

• G(V, E) = the graph representing the network topology; 

• The vertices of the graph are partitioned: V = UfL^i', 

• Ny (v) = the neighbors set of a vertex v on the subgraph 
induced by V'; 

• ME(v) = the set of modified RT entries at the router 
modeled by v ; 

• e(v, V) = 1 if v is not in V' and has a neighbor in V, 
and otherwise; 

• Esize = the average size of a RT entry. 

Solution A implies no internal communication overhead, but 
leads to Y,i=iJ2 v ev i (\ N v\v t (v)\-\ME(v)\) communication 
between LPs. In Solution B, as boundary nodes are duplicated, 
there are E*U E^ [(£* =1 e(v, internal up- 

dates, but the communication between LPs is negligible since 
we only need to send an update event containing the identifiers 
of both routers. There is no memory overhead in Solution A 

compared to the | V\. Ei=i E„ e y 4 [(E*=i e ( v : V j))] bits that 
are needed in Solution B to store the duplicated boundary 
nodes. 

We use mixed integer programs (MIP) to compute optimal 
bipartitions of the topology by minimizing the communication 
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Fig. 1. Two solutions to allocate edges with end-points on different LPs 



overhead. In Solution A, we assign to every edge uv £ E(G) 
a weight W uv that approximates the number of entries ex- 
changed on this edge. The sum of weights over all edges 
having end points in different subsets is minimized. In Solution 
B, we assign to every vertex v € V(G) a weight W v that 
approximates \ME{v)\. In this case, the sum of weights over 
all vertices having at least one neighbor in the foreign subset 
is minimized. 

IV. Execution 

Our objective is to determine the communication overhead 
of BGP caused by a distributed parallel implementation of 
DRMSim. We count the number of BGP update messages 
exchanged at the boundary edges of the topology partitions. 
The communication time overhead is derived by estimating 
the transmission and the propagation time for a single BGP 
update message between two LPs. 

A. Simulation scenarios 

We first consider MinRouteAdvertisementlnterval (MRAI) 
= Os in our experiments. It represents an upper bound on the 
amount of communication between BGP routers. Indeed, in 
the Internet, the default MRAI value is set to 30s in order to 
limit the message rate between BGP peers. 

The BGP peering session establishment delay together with 
the update propagation delay between routers play a major 
role in the amount of transmitted updates. Three scenarios 
have been elaborated: 

• Scenario 1: considers BGP peering sessions establish- 
ment before the start of updates exchanges. Once sessions 
are established, received updates are executed in their 
scheduled order. 

• Scenario 2: considers Scenario 1. However, upon recep- 
tion, updates are executed in a random order simulating 
highly random communication delay. 

• Scenario 3: after one peering session establishment, the 
resulting updates are executed. Then, the next scheduled 
BGP peering session is established. 

These scenarios are executed once on topologies of 2.5k, 3k, 
3.5k, 4k, 4.5k and 5k nodes. These topologies are generated 
according to the Generalized Linear Preferential (GLP) model 
parameterized as specified in |2j. 

B. Execution environment 

Sequential simulations of BGP to derive inter-partitions 
communication have been executed with DRMSim on an Intel 
Xeon 3.20Ghz with 64GB of RAM. Execution of Scenario 
1 took 52 minutes (min), Scenario 2 48min and Scenario 3 
16min on topologies of 5k nodes. 

C. Simulation results 

We first measure to obtain reference values, the number of 
BGP updates with their respective number of entries on non- 
partitioned topologies according to the different scenarios. 



TABLE I 
Number of update entries 
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1) BGP reference results on non partitioned topologies: 
From Table [I] on non partitioned topologies, the number of 
BGP update entries increases drastically. If the communication 
overhead between LPs behaves similarly, then the time to per- 
form simulations becomes excessive. This observation shows 
that setting MRAI value to 0s leads to detrimental effects in 
terms of routing convergence time. We also observe that the 
increase of BGP update entries over the topology size is linear 
in their root square, allowing us to extrapolate this number for 
the topology size of interest (100k nodes). 

2) BGP results on bipartitioned topologies: We compute 
optimal bipartitions of the topologies according to the method- 



ology presented in Section III- A with Solution A and B. As 



optimal bipartitions, we know that the measured communica- 
tion give us a lower bound on the overhead. 

In Solution A, for the three scenarios and a topology of 5k 
routers, around 10% of the total number of updated entries 
transit between partitions. In Solution B, we observe for the 
three scenarios that the number of transmitted entries improves 
compared to Solution A. Only 5.6% of the reference updated 
entries have to be transmitted with Scenario 1,5% for Scenario 
2 and 8% for Scenario 3. 

3) Communication overhead: We measure the total size 
of updated entries transmitted for the different scenarios and 
topologies. The average size of each entry is 1 for Scenario 
1 and between 2.8 and 4.5 for Scenarios 2 and Scenario 3. 
We also need to determine the time to transmit an updated 
entry between two LPs. Considering lGbps link using TCP/IP, 
we obtain an average of 0.26ms for each packet (over 1000 
packets sent one after the other) to reach its destination. 

With Solution A applied to Scenario 7 on a topology of 5k 
routers, we can expect 45min to transmit all updated entries 
between the LPs. By extrapolation, we can expect around 
91min of overhead for a topology of 10k routers and 15hours 
for 100k routers. For the two other scenarios executed on a 
topology of 5k routers, Scenario 2 cumulates an overhead of 
llOmin and Scenario 3 remains in the same order of time 
as Scenario 1. By extrapolating Scenario 2 overhead, we can 
expect about 4h of overhead for a topology of 10k routers and 
about 36hours for 100k routers. 



The expansion of the Internet and its dynamics require 
to handle the evolutions of its routing system for the next 
decade. Moving DRMSim to a Distributed Parallel Discrete 
Event simulator seems to provide a promising technique to 
make abstraction of the topology size although it induces com- 
munication overhead between logical processes. Therefore, 
this paper quantifies the expected additional time needed to 
distribute the simulation of BGP on topologies representative 
of this growth, i.e., 10 to 100k. For this purpose, we have 
identified BGP updates as the main cause of communication 
between nodes. We computed the number of updates and 
their respective number of entries for different BGP execution 
scenarios. Then, we have computed the best bipartition of each 
topology - by using the previous measured number of updates 
- to derive the amount of communication (the overhead) 
between partitions. It appears that for Scenario 1, distributing 
the simulation between partitions requires an additional time 
ranging from 91min (10k nodes) to 15hours (100k nodes). As 
shown by the Scenario 2 and Scenario 3, this additional time 
clearly depends on the execution scenario and can reach up to 
36hours for topologies of 100k nodes. Decreasing the number 
of updates can be achieved by increasing the MRAI time to 
a value higher than 0. As the MRAI increases, the number 
of BGP updates decreases significantly (in the order of factor 
10), until the convergence time reaches an average minimum. 
The critical issue becomes thus to determine the optimal 
MRAI value, for which the decrease in update rate would 
not increase the convergence time while limiting the amount 
of additional required memory. In order to further reduce the 
overhead, we have proposed to synchronize boundary nodes 
between partitions. By applying this method, we were able to 
save half of the overtime when considering Scenario 1 and 2. 
Consequently, such a distributed parallel simulator seems thus 
feasible, leading in turn to the next step of our work being the 
extension of our routing model simulator DRMSim according 
to the proposed distribution models. 
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